Cloud-Enabled Architecture For On-Demand Native Application Crawling

ABSTRACT

A search system includes a query receiver, a live scrape system, and a content provider. The query receiver is configured to receive a first query related to a first application state of a first application. The first query is based on a user-requested search from a user device. The live scrape system is configured to operate a device including an operating system in which an instance of the first application executes. The live scrape system is configured to, responsive to the first query, and within the device, (i) navigate to the first application state in the instance of the first application and (ii) extract content from the instance of the first application. The content provider is configured to provide the extracted content to the user device for presentation to a user of the user device in real time.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. patent applicationSer. No. 14/981,244 filed on Dec. 28, 2015 and U.S. patent applicationSer. No. 14/981,300 filed on Dec. 28, 2015, which both claim the benefitof U.S. Provisional App. No. 62/204,974, filed on Aug. 13, 2015. Theentire disclosures of the applications referenced above are incorporatedby reference.

FIELD

The present disclosure relates to crawling apps for content, and moreparticularly to crawling mobile apps for content.

BACKGROUND

Search engines are an integral part of today's world. A key component ofa search engine is the collection of search indices that power thesearch. In the context of a search engine, a search index can be aninverted index that associates keywords or combinations of keywords todocuments (e.g., web pages) that contain the keyword or combination ofkeywords. In order to generate and maintain these search indexes, mostsearch engines use crawlers to identify documents and information withinthe documents. A traditional crawler requests a document from a contentprovider and the content provider provides the requested document to thecrawler. The crawler then identifies and indexes the keywords andcombinations of keywords in the document.

As the world transitions to a mobile-based architecture, the way contentproviders provide access to their content is changing. User devices canaccess content using a variety of different mechanisms. For example,user devices can obtain content from a content provider using a nativeapp dedicated to accessing a software app of the content provider or aweb browser that accesses the software app using a web browser.Furthermore, content providers may allow access to different contentdepending on the geographic region of a user device, the type of userdevice, the time of day, and/or the operating system of the user device.For these and other reasons, crawling has become an increasinglydifficult task.

SUMMARY

A search system includes a query receiver, a live scrape system, and acontent provider. The query receiver is configured to receive a firstquery related to a first application state of a first application. Thefirst query is based on a user-requested search from a user device. Thelive scrape system is configured to operate a device including anoperating system in which an instance of the first application executes.The live scrape system is configured to, responsive to the first query,and within the device, (i) navigate to the first application state inthe instance of the first application and (ii) extract content from theinstance of the first application. The content provider is configured toprovide the extracted content to the user device for presentation to auser of the user device in real time.

In other features, the device comprises one of an emulator configured toexecute the operating system on emulated hardware and a simulatorconfigured to simulate interaction of the operating system with thefirst application. In other features, the live scrape system isconfigured to, responsive to the first query, and within the device, (i)navigate to the first application state in the instance of the firstapplication, (ii) supply search parameters based on the first query tothe first application state, (iii) initiate a search action from thefirst application state, and (iv) extract content from search resultsprovided by the instance of the first application in response to thesearch action.

In other features, the search system includes a query understandingpipeline configured to parse the user-requested search and identify aset of entity types provided in the user-requested search. The firstquery is generated in response to the identified set of entity typesfrom the user-requested search. In other features, the user-requestedsearch includes a text string. The query understanding pipeline isconfigured to identify one or more substrings of the text string, eachsubstring corresponding to an entity having a corresponding entity type.

In other features, the search system includes a rules-based searchmodule configured to store a set of rules, each rule designating (i) anapplication, (ii) an application state of the designated applicationthrough which a search can be performed, and (iii) for each inputparameter of the search, a type of entity that can be provided as theinput parameter. The rules-based search module is configured to select afirst rule from the set of rules based on a match between the designatedentity types for the first rule and the identified set of entity typesfrom the user-requested search. The rules-based search module isconfigured to generate the first query based on the first rule.

In other features, the extracted content is provided to the user devicein a form of at least one deep view card (DVC). Each DVC of the at leastone DVC includes content to be displayed to the user including anidentification of the first application, an identification of the firstapplication state, and information extracted from the first applicationstate. In other features, the search system includes a DVC builderconfigured to format the extracted content into the at least one DVC.The search system includes a DVC template store configured to store afirst template that specifies which portions of the extracted contentcorrespond to fields in the at least one DVC.

In other features, the search system includes a query router configuredto, responsive to the query receiver receiving a second query, (i)determine whether results data corresponding to the second query havebeen cached, (ii) in response to the results data having been cached,selectively provide the results data to the content provider, and (iii)in response to the results data not having been cached, selectivelyprovide the second query to the live scrape system. In other features,the query router is configured to, responsive to the query receiverreceiving the second query, selectively provide the second query to thelive scrape system in response to the cached results data being out ofdate.

In other features, the search system includes a query router configuredto, responsive to the query receiver receiving a second query, (i) inresponse to an application programming interface (API) being availablefor an application state specified by the second query, selectivelyprovide the second query to an API passthrough to be serviced by theavailable API and (ii) in response to no API being available for theapplication state specified by the second query, selectively provide thesecond query to the live scrape system.

In other features, the search system includes a query router configuredto, responsive to the query receiver receiving a second query, (i) inresponse to a web edition of an application state specified by thesecond query being available via HTTP (hypertext transfer protocol),selectively provide the second query to a web scrape system to beserviced by the available web edition and (ii) in response to no webedition being available for the application state specified by thesecond query, selectively provide the second query to the live scrapesystem.

A computer-implemented method includes receiving a first query relatedto a first application state of a first application. The first query isbased on a user-requested search from a user device. The method includesoperating a device including an operating system in which an instance ofthe first application executes. The method includes, responsive to thefirst query, and within the device, (i) navigating to the firstapplication state in the instance of the first application and (ii)extracting content from the instance of the first application. Themethod includes providing the extracted content to the user device forpresentation to a user of the user device in real time.

In other features, operating the device comprises one of operating anemulator configured to execute the operating system on emulated hardwareand operating a simulator configured to simulate interaction of theoperating system with the first application. In other features, themethod includes, responsive to the first query, and within the device,(i) navigating to the first application state in the instance of thefirst application, (ii) supplying search parameters based on the firstquery to the first application state, (iii) initiating a search actionfrom the first application state, and (iv) extracting content fromsearch results provided by the instance of the first application inresponse to the search action.

In other features, the method includes parsing the user-requestedsearch. The method includes identifying a set of entity types providedin the user-requested search. The first query is generated in responseto the identified set of entity types from the user-requested search. Inother features, the user-requested search includes a text string. Themethod includes identifying one or more substrings of the text string.Each substring corresponds to an entity having a corresponding entitytype.

In other features, the method includes storing a set of rules, each ruledesignating (i) an application, (ii) an application state of thedesignated application through which a search can be performed, and(iii) for each input parameter of the search, a type of entity that canbe provided as the input parameter. The method includes selecting afirst rule from the set of rules based on a match between the designatedentity types for the first rule and the identified set of entity typesfrom the user-requested search. The method includes generating the firstquery based on the first rule.

In other features, the extracted content is provided to the user devicein a form of at least one deep view card (DVC). Each DVC of the at leastone DVC includes content to be displayed to the user including anidentification of the first application, an identification of the firstapplication state, and information extracted from the first applicationstate. In other features, the method includes formatting the extractedcontent into the at least one DVC. The method includes storing DVCtemplates including a first template. The first template specifies whichportions of the extracted content correspond to fields in the at leastone DVC.

In other features, the method includes, in response to receiving asecond query, (i) determining whether results data corresponding to thesecond query have been cached, (ii) in response to the results datahaving been cached, selectively providing the cached results data to theuser device, and (iii) in response to the results data not having beencached, selectively navigating to an application state based on thesecond query, extracting content, and providing the extracted content tothe user device. In other features, the selectively navigating to anapplication state based on the second query, extracting content, andproviding the extracted content to the user device are performed inresponse to the cached results data being out of date.

In other features, the method includes, in response to receiving asecond query corresponding to a second application, in response to anapplication programming interface (API) being available for anapplication state specified by the second query, selectively (i)preparing an API call, (ii) transmitting the API call to a backendserver for the second application, and (iii) providing content to theuser device based on a response to the API call from the backend server.The method includes, in response to receiving a second querycorresponding to a second application, in response to no API beingavailable for the application state specified by the second query,selectively navigating to an application state based on the secondquery, extracting content, and providing the extracted content to theuser device.

In other features, the method includes, in response to receiving asecond query corresponding to a second application, in response to a webedition of the second application being available, selectively (i)sending an HTTP (hypertext transfer protocol) request to a backendserver for the second application, (ii) receiving an HTTP response fromthe backend server, (iii) scraping data from the HTTP response, and (iv)providing content to the user device based on the scraped data. Themethod includes, in response to receiving a second query correspondingto a second application, in response to no web edition being availablefor the second application, selectively navigating to an applicationstate based on the second query, extracting content, and providing theextracted content to the user device.

Further areas of applicability of the present disclosure will becomeapparent from the detailed description, the claims and the drawings. Thedetailed description and specific examples are intended for purposes ofillustration only and are not intended to limit the scope of thedisclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure will become more fully understood from thedetailed description and the accompanying drawings.

FIG. 1A is an example of a graphical user interface according to theprinciples of the present disclosure.

FIG. 1B is another example of a graphical user interface.

FIGS. 2A-2C are graphical representations depicting app states ascombinations of functions and entities.

FIG. 3A is a graphical representation of an example functional ontology.

FIG. 3B is a simplified graphical example of linkingapplication-specific entity information to a canonical entity.

FIG. 3C is a graphical example of conversion from functional URLs toaccess URLs.

FIG. 4 is a graphical example of conversion from functional URLs todynamic access URLs.

FIG. 5A is a graphical representation of an example app state recordformat.

FIG. 5B is a graphical representation of an example app state recordaccording to the format of FIG. 5A.

FIG. 6A is a graphical representation of an example app record format.

FIG. 6B is a graphical representation of an example app record accordingto the format of FIG. 6A.

FIGS. 7A-7B are block diagrams of example implementations of a searchsystem according to the principles of the present disclosure.

FIG. 8A is a functional block diagram of an example implementation ofthe dynamic acquisition module of FIG. 7A.

FIG. 8B is a functional block diagram of an example implementation of anarrangement for search system API call creation and client APIinvocation.

FIG. 9 is a functional block diagram of an example search systemservice.

FIG. 10 is a functional block diagram of an example implementation ofthe live scrape system of FIG. 9.

FIG. 11A is a functional block diagram of an example implementation ofan emulation server of FIG. 10.

FIG. 11B is a functional block diagram of another example implementationof an emulation server.

FIG. 12 is a functional block diagram of an example implementation ofthe cache manager of FIG. 9.

FIG. 13 is a high-level functional block diagram of communication withinan example environment where the search system operates.

FIG. 14 is a flowchart showing example operation of integrating new appsinto the search system.

FIG. 15 is a flowchart showing example operation relating to routing areceived query.

FIG. 16 is a flowchart showing example operation of a cache systemaccording to the principles of the present disclosure.

In the drawings, reference numbers may be reused to identify similarand/or identical elements.

DETAILED DESCRIPTION

In order to present content from mobile applications (referred tointerchangeably as “apps”), a search system generally crawls and scrapesthe mobile apps to extract the content and index the content. Thisrequires significant resources, both at the time of incorporating a newapp into the search system as well as when updating the search system toacquire additional or changed content from the app. The expense may bein terms of human operator time, processing time, and storage space.

Some apps may present application programming interfaces (APIs), whichcan be queried by the search system to retrieve results in real time. Byobtaining results in real time, upfront processing and storage costs canbe reduced. However, many apps do not present APIs accessible to a thirdparty. As a result, these apps are currently crawled, scraped, stored,and indexed. Further, some apps have no static content to be scraped.Search results related to these apps may therefore be lacking richcontent.

For example only, in FIG. 1A, a search system 100 is shown providingresults to a device 104. The device 104 is shown as a smartphone, butcould be any other type of user device, such as a laptop, tablet,smartwatch, or desktop computer. A query of “thai” is received from auser via text box 106 sent from the device 104 to the search system 100within a query wrapper. Results, commonly referred to as deep viewcards, are returned by the search system 100 to the device 104.

The results indicate apps and specific states (or, screens) of apps thatare relevant to the user's query. As described below, the user's querymay be based on other inputs than a text box. For example, the user'squery may be inferred from a button press, or from other interactionwith the user interface. Further, queries may be generated by anotherservice on behalf of the user, such as by a search aggregator platformthat provides search results from various search systems.

As seen in FIG. 1A, the device 104 displays rich search results based onthe WIKIPEDIA online encyclopedia app and the YELP restaurant reviewapp. The rich information presented is referred to as a deep view card(DVC). A DVC will show more than the identification of the app or appstate, and will instead include other metadata.

For example, the metadata may include a title of the app state, or adescription of the app state, which may be a snippet of text from theapp state. Other metadata may be provided from the app state, includingimages, location, number of reviews, average review, and statusindicators. For example, a status indicator of “open now” or “closed”may be applied to a business depending on whether the current time iswithin the operating hours of the business.

Another feature of some DVCs may be that information (such as text) thatled to the DVC being selected as a search result may be highlighted. Inthe example of FIG. 1A, the word “Thai” is shown in bold.

The DVC may also include direct actions, such as the ability to call anestablishment or to get navigation directions to the establishment.Other interactions with the DVC may take the user to the indicatedstate. As described in more detail below, this may be accomplished byopening the relevant app or, if the app is not installed, opening awebsite related to the desired app state. In other implementations, anapp that is not installed may be downloaded, installed, and thenexecuted in order to reach the desired app state.

In other words, a DVC includes an indication of the app state as well asadditional content from the app state itself. The additional contentallows the user to make a more informed choice about which result tochoose, and may even allow the user to directly perform an actionwithout having to navigate to the app state. If the action the userwants to take is to obtain information, in some circumstances the DVCitself may provide the necessary information to accomplish such action.

A DVC 112-1 for “Thai Cuisine” corresponds to the WIKIPEDIA app. In onenomenclature, the “Thai Cuisine” DVC is considered to be an element of aWikipedia deep view (DV). In other words, a DV corresponds to an app andcan include multiple DVCs. For example, outlines of multiple DVCs per DVare shown in FIG. 1B.

For example, the Wikipedia DV may include the DVC 112-1 for “ThaiCuisine” as well as a DVC for “Thai language.” As another example, theYelp DV may include DVCs for multiple restaurants. However, the UBERride sharing app may have no static states that can be scraped. Instead,each DVC for an Uber DV corresponds to an available ride. Meanwhile, theDVCs for Wikipedia and Yelp may be static states that could be scrapedwith an investment of time, processing power, and memory storage.

As yet another example, the GASBUDDY gas station pricing app does nothave static states and also does not offer a third party API. In orderto provide DVC content for the GasBuddy app, the present disclosureimplements on-demand app crawling. By executing the GasBuddy app withinan emulator controlled by the search system 100, the content displayedby the GasBuddy app can be scraped on demand and provided as DVC contentby the search system.

The emulator may be configured to provide context based on, for example,a location of the device 104. In other words, the GasBuddy app executingwithin an emulator may be told (such as by forcing the GPS coordinatesor other location service of the emulator) that the emulator's locationis the same as the present location of the device 104. In this way, theresults generated by the GasBuddy app are relevant to the location ofthe device 104.

Similarly, an emulator can execute the WIKIPEDIA app, perform a searchfor “THAI,” and provide DVC content obtained from the WIKIPEDIA app inreal time. The term “real time” in this context generally requires thatthe delay between the user making a request and the results beingreturned is on the order of milliseconds or seconds, not tens ofseconds, minutes, or hours. In other words, the user will not view thesearch app as unresponsive due to a long wait for results. Further, withthe exception of caching, which may be available in some implementationsfor some searches, “real time” means that the results are procureddirectly in response to the request as quickly as possible and promptlydisplayed to the user.

In these respects, “real time” results differ from pre-scraped resultsthat may be obtained ahead of time based on anticipated user queries. Inother words, “real time” results are obtained from an app immediatelyfollowing, and directly responsive to, a request from a live user who isexpecting a prompt response to the query.

In FIG. 1B, an unsophisticated showtimes app 116 is shown running on thedevice 104. A first state of the showtimes app 116 is shown at 116-1within a representation of the device 104-1. A second state of theshowtimes app 116 is shown at 116-2 within a representation of thedevice 104-2. In the simple interface of the first state 116-1, a searchbar 112 identifies a movie for which theater showtimes are beingdisplayed (a fictional “The Hobbit XIII”).

The first state 116-1 displays theaters and showtimes for theaterslocated near the present location of the device 104 but may allow theuser to change the desired location using a “change location” button122. A variety of functional search buttons are provided, including a“find food” 124. As an example only, a stylized “Q” may be displayed,which is associated with Quixey, Inc. of Mountain View, Calif.

When a user selects (such as by tapping their finger) the “find food”button 124, a query wrapper 136 is sent to the search system 100. Thequery wrapper 136 includes an indication that food results are desired,such as a text query of “restaurants.” The search system 100 interpretsthe query, gathers results, and provides deep view card (DVC) results140 to the showtimes app 116. These results are shown in the secondstate 116-2.

The search system 100 may contextualize the query based on data known tothe search system 100 or provided by the device 104. For example, thesearch system 100 may recognize that the present local time where thedevice 104 is located is late evening and therefore perform a search for“late night restaurants.” In addition, the search system 100 may provideresults based on a location of the device 104. The location of thedevice 104 may be provided within the query wrapper 136 or may bedetermined in another way, such as based on IP geolocation.

In the second state 116-2, a query box 150 may indicate a textualrepresentation of a performed query. Although the text shown reads “latenight restaurants,” the initial text query of “restaurants” may insteadbe shown. In other implementations, no indication of the query isvisible. In various implementations, the query box 150 may allow a userto modify the query and perform a new search.

The DVC results 140 from the search system 100 include a deep view (DV)154 for the YELP app and a DV 158 for the TRIPADVISOR travel review app.While not shown with the actual rich data placeholders for DVCs 162 and166 are shown within the Yelp DV 154. The DVC 162 corresponds to aspecific DENNY'S restaurant while the DVC 166 corresponds to a specificSONIC drive-in restaurant. The TripAdvisor DV 158 includes a DVC 170 foran HOP restaurant and a DVC 174 for a Denny's restaurant, again shownonly as outlines without the rich content generally present in DVCs. Invarious implementations, if the DVCs 162 and 174 are for the sameDenny's restaurant, one of the DVCs 162 and 174 may be omitted from thedisplayed results to make room for other results (in this case, likelyother restaurants).

Note that each of the DVCs 154 and 158 includes multiple DVCs. Dependingon quantity of search results, scores associated with the searchresults, and screen space, some DVs will have only a single DVC whilesome DVs may have more than two DVCs. For example, if the Yelp DV 154was only provided with a single DVC, there may be additional room todisplay a third DVC for the TripAdvisor DV 158.

The order of the DVCs within the DV may be set based on an orderestablished by the corresponding app. For example, the order of the DVCs162 and 166 for the Yelp DV 154 may be in the same order in which theDenny's and Sonic locations would be shown when the search was performedwithin the Yelp app itself. In addition, the DVCs selected forpresentation to the user of the showtimes app 116 may be the very firstresults within the Yelp app, indicating that the Yelp app consideredthose results to be most relevant.

In other words, the identity and order of the DVCs within a DV may beestablished by the app itself. Meanwhile, the ordering of the DVs withinthe presented search results may be based on a number of factors,including user engagement with an app (an app that receives more clickswhen presented to the user may increase in the rankings) and reliabilityof the app (applications that historically return results lessresponsive to user queries may have decreased rankings).

Entities

In FIG. 2A, an example is presented of how functions and entities can becombined to arrive at app states. A function of “Buy Tickets” 204-1 andan entity of “The Dark Knight Rises” 204-2 combine to create a state204-3 for buying tickets to “The Dark Knight Rises.”

Similarly, in FIG. 2B, a function of “Get Ride” 208-1 and a locationentity (such as WiFi-assisted GPS) of the device 208-2 combine to createa state 208-3 for getting ride/taxi from present location.

In FIG. 2C, a function “Find Flight” 212-1 combines with an entity “LAX”(Los Angeles Airport) 212-2 and an entity “SFO” (San Francisco Airport)212-3 to create a state 212-4 for finding a flight from LAX to SFO. Notethat the entities 212-2 and 212-3 in the final example may serve aseither departure location or arrival location. If there is no indicationof which entity is which, an alternative state (finding a flight fromSFO to LAX) is possible.

In FIG. 3A, a function ontology 220 includes a number of branch nodes(such as video 224-1 and restaurants 224-2), which may representdifferent industry verticals. The function ontology 220 includes leafnodes (such as buy tickets 228-1, rent video 228-2, reviews 228-3, andshowtimes 228-4) that each correspond to a particular function offeredby one or more apps. Leaf nodes underneath the restaurants branch node224-2 include reviews 232-1, business info (such as hours and address)232-2, and reservations 232-3. Further branch nodes may be specified,such as a buy branch node 236 under which leaf nodes download 240-1 andstream 240-2 are specified. The rent leaf node 228-2 could be similarlyexpanded. Each of the functions of the function ontology 220 may beaccomplished by a number of different apps. For example, video reviews228-3 are available from, among many other sources, an IMDB app, aNETFLIX app, and a GOOGLE PLAY app.

Access templates may be determined for the functions of each app. Aninformation triplet—the app, the function the app is to perform, and theentity for which the function will be performed—specifies a state. Whenthis triplet is formatted as a URL (uniform resource locator), the URLmay be termed a functional URL. Functional URLs may be used with anamespace “func://” to differentiate the functional URL from a standardweb URL in the “http://” namespace. Action templates define what typesand number of entities are required to specify a functional URL for agiven app function.

Examples of access templates for video reviews 228-3 are shown at 244-1,244-2, 244-3, and 244-4. The access template 244-1 for IMDb includes thenamespace, “func://,” an indication of the app, “imdb.com,” a separatorforward slash, an indication of the function to be performed, “moviereviews,” another separator forward slash, and then an indicator of anentity type, which is a movie_id.

Therefore, in order to use the access template 244-1, a movie entitymust be identified. Further, as described in more detail below, theidentified movie entity must be applicable to IMDb. In general terms,this means that if the movie is not present in the IMDb app, the accesstemplate 244-1 will be irrelevant. In technical terms, when attemptingto instantiate the access template 244-1, there will be no IMDb-specificinformation for that entity with which to populate the access template244-1.

In FIG. 3B, a brief example is shown of how a general (or canonical)entity relates to app-specific information. App-specific entities 248-1and 248-2 are present in a data store of the search provider. They maybe obtained from various data services as described in more detailbelow. The app-specific entity 248-1 for IMDb includes a hexadecimalidentification code (a fictitious ID is shown as an example only) andthe title used by IMDb. In addition, other information, such as reviews,actors, producers, etc., may be stored.

The app-specific entity 248-2 for Fandango includes an alphanumeric IDspecific to Fandango (a fictitious ID is shown as an example only), andincludes the title used by Fandango for the movie. Additionalinformation may also be stored.

A canonical entity 252 includes a canonical title, which in thisparticular case is the same as the titles for the app-specific entities248-1 and 248-2 are the same. If the titles in the app-specific entities248-1 and 248-2 differed, one would have to be chosen as the canonicaltitle; or, a canonical title different from both may be used.

A canonical year is included, which may be the same as year informationstored in the app-specific entities 248-1 and 248-2. The canonicalentity 252 includes links to the app-specific entities 248-1 and 248-2.These links may be in the form of unique key values, such as might beused in a relational database. Additionally or alternatively, specificIDs that match the IDs used in the app-specific entities 248-1 and 248-2may be used to link the canonical entity 252 to the app-specificentities 248-1 and 248-2. In other implementations, the canonical entitymay simply have fields for app-specific data, as opposed to links todistinct app-specific entities.

The canonical entity 252 may also include various metrics, such as apopularity score. This popularity score may be based on how popular thedescribed entity is from the various data sources relied upon and mayinclude information about how frequently the entity is selected whenpresented to the user. The popularity may be normalized to otherentities of the same type, in the same genre, etc.

In FIG. 3C, an access mapping module 260 converts the functional URLs244 into corresponding access URLs 264-1, 264-2, 264-3, and 264-4,respectively. The mapping is performed using access templates, whichtake specific information and insert it into a parameterized URL. Accesstemplates are stored in a URL templates data store 268. An entity datastore 272 stores information about entities, both canonical andapp-specific.

As an example only, if the “movie_id” of the functional URLs 244 was avalue corresponding to “Django Unchanged,” a 2012 film, the accessmapping module 260 would produce the access URLs 264 shown in FIG. 3C.Note that for the access URL 264-4, two app-specific values areretrieved from the entity data store 272: a Google-specific movie nameand a Google-specific movie ID. Similar to the association shown in FIG.3B, all of the app-specific data used to create the access URLs 264 islinked to a single “Django Unchained” canonical entity.

Dynamic States

In FIG. 4, functional URLs that correspond to dynamic states are shown.These are URLs that don't correspond to static states, such as aspecific movie or specific restaurant. As two examples, dynamicfunctional URLs may correspond to a state that provides search resultsof restaurants meeting a particular query or to a state that provides alist of movies meeting a particular query. As described in more detailbelow, these results can be scraped and converted into deep view cardsfor display to a user.

A dynamic access mapping module 300 receives functional URLs such asexample functional URLs 304-1, 304-2, and 304-3. The dynamic accessmapping module 300 relies on parameterized templates stored in a dynamictemplates data store 308. These templates are parameterized, meaningthat they are concatenations of segments of plain text as well asparameters whose values can be populated, similar to the format stringof the printf( ) function in the C programming language.

In other words, a template designated by the functional URL isinstantiated and populated with values provided by the functional URL. AYelp search template 312-1 is identified by the functional URL 304-1 andresults in a dynamic access URL 316-1. Note that the text string “thainear Cupertino,” with spaces converted to plus signs, is filled into the{text} parameter of the template 312-1.

In another example, an eat24 cuisine search by location template 312-2is instantiated using a text value of “thai” as well as a value“Cupertino” for the city parameter, both supplied by the functional URL.Alternatively, the value “Cupertino” may be determined based on alocation of the device that sent the query. For example, IP geolocationand/or explicit location information from the device may be used todetermine the location of the device. From that location, a city may beselected from the entity data store 272 to populate the city parameterof the eat24 template 312-2. The resulting dynamic access URL is shownat 316-2.

A TripAdvisor nearby search by cuisine template 312-3 is instantiatedusing a value of “thai” for the cuisine_type parameter. In addition, alocation ID unique to TripAdvisor is selected from the entity data store272. This results in dynamic access URL 316-3. The entity data store 272may store lists of common entity types, such as zip codes, cities,counties, states, cuisine types, etc. The template 312-1 and 312-2 bothallow a plaintext representation of cuisine to be supplied. Meanwhile,the template 312-3 uses cuisine_type, which may correspond to a list ofcuisine types in the entity data store 272.

In other examples, the list of cuisine types may be unique to the app,in which case the entity data store 272 may store a list of app-specificcuisine types. The app-specific cuisine-type corresponding to thecuisine supplied by the functional URL may be determined and used toinstantiate the template. Because the function for the functional URL304-3 is a nearby search, the location may be supplied by the dynamicaccess mapping module 300 even though the functional URL 304-3 does notspecify a specific location.

The dynamic access URLs 316 may also be referred to as search functionURLs (SFURLs), because they access the search functionality of an app.Although displayed in FIG. 4 for illustration purposes as havinghuman-readable and semantically-descriptive content, some (or, invarious implementations, all) SFURLs may actually carry little or nosemantic content. Instead, an SFURL may simply be a serializedmachine-readable data structure for interpretation by the query system.

The present disclosure, instead of relying on access URLs or dynamicaccess URLs to access web editions of apps, accesses the apps themselveseither on physical devices or on emulators. Because apps cannot alwaysbe launched to a specific state simply with a URL or other URI (UniformResource Identifier), the search system of the present application maynavigate to a desired state with a combination of intent calls and userinterface (UI) injection. The term “intent” is generally associated withthe ANDROID operating system, but is used in this disclosure simply torefer to a programmatic approach to reaching a specific state.Corresponding elements for the IOS operating system may be referred toas view controllers.

UI replay may be used to simulate a user tapping or making othergestures in an app as well as for supplying data, such as text normallyentered by a user through a soft keyboard. In various implementations,UI replay may be accomplished using an accessibility framework providedby the operating system or by a search-enabled app. Some states may bereached by a combination of intent invocation and UI replay. Forexample, an intent may be invoked to arrive at a search state and thenUI replay simulates a user typing text and clicking a search button.

The information for some apps is not feasible to crawl or index, meaningthat such a task is too expensive in terms of processing or storageresources, on the part of the search system and the infrastructuresupporting the queried app. An on-demand system may have knowledge ofentities available for each app, such as is shown in FIG. 3C. In suchcases, the on-demand system may navigate to a particular state of an appand extract the content of that state in order to provide a result to auser.

In other implementations, one or more apps may not have been crawled orscraped and, therefore, the search system does not know about statesthat may be relevant to a search query. As a result, dynamic access,such as shown in FIG. 4, may be used. In these situations, parameterizedtemplates may be specified when incorporating an app into the searchsystem. This process may be referred to as onboarding. During theonboarding process, configurators (which may include human operatorsand/or automated algorithms) identify what searches can be performed inan app and what types of entity data is specified in that search. Forexample, when onboarding a hypothetical restaurant review app, onetemplate may be a search based on cuisine type. Therefore, the entitytype for the one parameter of this template (similar to the template 312of FIG. 4) is cuisine.

As described in more detail below, when a user performs a text-basedquery, a query parser may attempt to identify which entity types areindicated by the query. Once the search system has identified whichentity types have been provided in the query, the search system canidentify which templates are available that can be instantiated usingthose entity types. Matching parameterized templates are then used toobtain results for the user's query. As described in more detail below,some of those results may be obtained by doing web queries such as areshown at 316 in FIG. 4. Other results are provided by accessing an appexecuting within an emulator, navigating to the corresponding state, andperforming the search within that state. Results from the search arethen selectively provided back to the user.

App State Records

While the above description characterizes app states as an intersectionof action (or, function) and entity, app states can instead be indexedin a single-level hierarchy, where each app state corresponds to an appstate record. Some search systems may index app states as individualrecords while other search systems may employ the action/entity model.In other implementations, some apps may lend themselves better to theflat app state record model, and the search system may use both models.As described in the following paragraphs, an app state record mayinclude metadata specifying the function or functions that can beperformed using the app state, indicating the similarities between thetwo representations.

If an app state has already been crawled so as to populate an app staterecord in a data store (such as a database) of the search system, theability to perform an on-demand crawl of the app state may be lessimportant. However, for app states whose contents change frequentlyand/or unpredictably, an on-demand crawl may be the best way ofproviding up-to-date data to a user.

In FIG. 5A, an example format of an app state record 404 includes astate identifier (ID) 404-1, app state information 404-2, an appidentifier (ID) 404-3, and one or more access mechanisms 404-4 used toaccess the app state.

The state ID 404-1 may be used to uniquely identify the app state record404 among the other app state records included in a data store. In someexamples, the state ID 404-1 describes a function and/or an app state inhuman-readable form. For example, the state ID 404-1 may include thename of the app referenced in the access mechanisms 404-4.

In a specific example, a state ID 404-1 for an Internet music player appmay include the name of the Internet music player app along with thesong name that will be played when the Internet music player app is setinto the state defined by the access mechanism 404-4 included in the appstate record 404. In some examples, the state ID 404-1 includes a stringformatted similarly to a uniform resource locator (URL), which mayinclude an identifier for the app and an identifier of the state withinthe app. In other implementations, a URL used as the state ID 404-1 mayinclude an identifier for the app, an identifier of a function to beprovided by the app, and an identifier of an entity that is the targetof the function.

The app state information 404-2 may include data that describes an appstate into which an app is set according to the access mechanisms 404-4in the app state record 404. The types of data included in the app stateinformation 404-2 may depend on the type of information associated withthe app state and the functionality specified by the access mechanisms404-4. The app state information 404-2 may include a variety ofdifferent types of data, such as structured, semi-structured, andunstructured data.

The app state information 404-2 may be automatically and/or manuallygenerated and updated based on documents retrieved from various datasources. For example, the data sources may include app developers'websites, data feeds provided by developers, digital distributionplatforms, blogs, app review websites, social networking sites, andonline databases.

In some examples, the app state information 404-2 includes datapresented to a user by an app when in the app state corresponding to theapp state record 404. For example, if the app state record 404 isassociated with a music player app, the app state information 404-2 mayinclude data that describes a song (e.g., name and artist) that isdisplayed and/or played when the music player app is set to thespecified app state.

When the app state record 404 corresponds to a default state of an app,the app state information 404-2 may include information generallyrelevant to the app and not to any particular app state. For example,the app state information 404-2 may include the name of the developer ofthe app, the publisher of the app, a category (e.g., genre) of the app,a text description of the app (which may be specified by theapplication's developer), and the price of the app. The app stateinformation 404-2 may also include security or privacy data about theapp, battery usage of the app, and bandwidth usage of the app. The appstate information 404-2 may also include app statistics, such as numberof downloads, download rate (for example, average downloads per month),download velocity (for example, number of downloads within the pastmonth as a percentage of all-time downloads of the app), number ofratings, and number of reviews.

The app ID 404-3 uniquely identifies an app associated with the appstate record 404. The access mechanisms 404-4 specify one or more waysthat the state specified by the app state record 404 can be accessed.For any given user device, only some of the access mechanisms 404-4 maybe relevant.

For illustration, in FIG. 5B an example app state record 408 includes astate ID 408-1 in the form of human-readable text: “Free Photo Editor:Edit An Image.” The example app state record includes app stateinformation 408-2, including app category, state name, text description,user reviews (numerical and/or text), and available functions. Forexample, the available functions for this state may include cropping theimage, rotating the image, and removing red eye.

An app ID 408-3 uniquely identifies the Free Photo Editor app. The appID 408-3 may refer to a canonical Free Photo Editor software productthat encompasses all of the editions of the Free Photo Editor app,including all the native versions of the Free Photo Editor app acrossplatforms (for example, the IOS operating system and the ANDROIDoperating system) and any web editions of the Free Photo Editor app.

There are three access mechanisms 408-4 shown: a web access mechanism, anative app access mechanism, and a native download access mechanism. Theweb access mechanism may take the form of a URL (uniform resourcelocator) that corresponds to a web page for “Edit An Image” on the FreePhoto Editor website.

The native access mechanism may include an app resource identifier forthe native edition of the Free Photo Editor app on a particularoperating system and one or more operations that navigate to the statein the Free Photo Editor app for the Edit An Image state. In variousimplementations, and for various app states, an access mechanism may beable to directly access the state (such as by using an ANDROID operatingsystem intent). If the Free Photo Editor: Edit An Image app state isavailable on multiple operating system platforms, there would generallybe multiple native access mechanisms.

The download access mechanism may include instructions to open a portalto a digital distribution platform to download and install the app,followed by opening the app and navigating to the correct state, atwhich point the opening and the navigating may be the same as the nativeaccess mechanism. In other words, the actions taken by the downloadaccess mechanism may be a superset of those of the native accessmechanism.

App Records

In some implementations, app results may be returned in addition to appstate results. For example, if the user appears to have provided only anaction with no specified entity, identifying applications that canperform that action may be more helpful than providing results tospecific instances of those functions in an app. Further, when an appstate or a dynamic search cannot be identified for a user query, one ormore relevant apps may provide the functionality the user is lookingfor.

In FIG. 6A, an example format of an app record 424 includes an app name424-1, an app identifier (ID) 424-2, and app attributes 424-3. The apprecord 424 generally represents data relevant to a data store for aspecific app. A data store may include thousands or millions of recordshaving the structure specified by the app record 424. The app ID 424-2uniquely identifies an app in the data store. The app ID 424-2 may beassigned by the search system 132 and may therefore be independent ofany ID assigned by, for example, a digital distribution platform.

A single value for the app ID 424-2 may cover multiple app editions. Theterm “edition” applies to multiple versions of a single software productand may also apply to versions of that software product released foralternative operating systems. For example only, Angry Birds (as shownin FIG. 6B) may be available on Android and iOS mobile device platformsand, for each platform, may have a series of versions as bug fixes arereleased and as the app is updated to take advantage of, and to adaptto, newer versions of operating system. For some or all of the states,the software product may also have a web edition, which may be accessedusing a browser.

In FIG. 6B, an example app record 428 for an ANGRY BIRDS app includes aname 428-1 of “Angry Birds” and a unique ID 428-2 expressed inhexadecimal as 0x3FF8D407. Attributes 428-3 for Angry Birds may includea name of the developer of Angry Birds, text reviews of Angry Birds, agenre indicator for Angry Birds (such as “Games,” or sub-genre“Physics-Based Games”), ratings (such as star ratings) for Angry Birds,a textual description (which may be provided by the developer), a numberof downloads (which may be restricted to the most recent edition orcould be for all editions), access mechanisms (how to open Angry Birdswhen already installed or how to install Angry Birds when not yetinstalled), and device info (for example, minimum requirements ofoperating system, hardware, and resolution for best operation).

In some examples, a single software product can provide more than onefunction. For example, a restaurant reservation app may also allow auser to read user reviews for a restaurant in addition to makingreservations. As another example, a media player app may also allow auser to perform searches for digital media, purchase digital media,generate media playlists, and share media playlists.

The functions of a software product may be accessible using native appeditions of the software app and/or web app editions of the softwareapp. A native edition (or, “native application”) is, at least in part,installed on a user device. In some scenarios, a native app is installedon a user device, but accesses an external resource (e.g., a databaseserver) to obtain data from the external resource. For example, socialmedia apps, weather apps, news apps, and search apps may respectively beaccessed by one or more native apps that execute on various userdevices.

In other scenarios, a native app is installed on the user device anddoes not access any external resources. For example, some gaming apps,calendar apps, media player apps, and document viewing apps may notrequire a connection to a network to perform a particular function. Inthese examples, the functionality of the software product is encoded inthe native app itself.

Web editions (also referred to as “web applications”) of a software maybe partially implemented by a user device (such as by a web browserexecuting on the user device) and partially implemented by a remotecomputing device (such as a web server or app server). For example, aweb app may be an app that is implemented, at least in part, by a webserver and accessed by a web browser native to the user device. Exampleweb apps include web-based email, online auctions websites,social-networking websites, travel booking websites, and online retailwebsites. A web app accesses functions of a software product via anetwork.

When rendering a set of app search results, a user device displays a setof user-selectable links that can be selected by a user of the userdevice. A user-selectable link may include one or more underlying accessmechanisms. A user-selectable link, when selected by a user, causes theuser device to access a software product using an edition of thesoftware app identified by the access mechanism.

Examples of access mechanisms include native access mechanisms, webaccess mechanisms, download access mechanisms, and scripts. A nativeaccess mechanism may be a string that includes a reference to a nativeapp and indicates one or more operations for the user device to perform.If a user selects a user selectable link including the native accessmechanism, the user device may launch the corresponding native app.

In some implementations, any combination of the operating system of theuser device, a search app executed by the user device, a native appexecuted by the user device, and/or a web browser executed by the userdevice can launch the native app referenced in the native accessmechanism.

A web access mechanism may be a resource identifier that includes areference to a web resource (e.g., a page of a web application/website),such as a uniform resource locator (URL) used with hypertext transferprotocol (HTTP). If a user selects a user-selectable link including aweb access mechanism, the user device may launch a web browser app andmay pass the resource identifier to the web browser.

An app download access mechanism may indicate a location (such as adigital distribution platform) where a native app can be downloaded inthe scenario where a native app edition of the app is not installed onthe user device. If a user selects a user-selectable link including anapp download access mechanism, the user device may access a digitaldistribution platform from which the referenced native app edition maybe downloaded. The user may opt to download the native app edition. Uponinstallation, the user device may automatically launch the native appedition.

A script access mechanism is a set of instructions that, when executedby the user device, cause the user device to access a resource indicatedby the script. For example, the script may instruct an operating systemof the user device to: launch a digital distribution platform interfaceapp; browse to the specified native app within the digital distributionplatform interface app; install the specified native app; and then openthe specified native app.

Block Diagrams

In FIG. 7A, an example implementation of the search system 100 includesa query understanding pipeline 504 that receives the query wrapper andparses the query to determine search tokens and infer the intent of thequery. For example only, see commonly-assigned U.S. patent applicationSer. No. 14/566,283, filed Dec. 10, 2014, titled “Rules-Based Generationof Search Results,” with first-named inventor Michael Harris, the entiredisclosure of which is incorporated by reference. See alsocommonly-assigned U.S. Provisional App. No. 62/179,674, filed May 13,2015, titled “Application Search System,” with first-named inventorLeigh Klotz, the entire disclosure of which is incorporated byreference.

The query understanding pipeline 504 may provide one or more queryparses to an app search module 508, which may identify apps relevant tothe query parses from an app record store 512. For example, records inthe app record store 512 may have a format similar to that of the apprecord 424 of FIG. 6A.

The query understanding pipeline 504 may also provide one or more queryparses to an index search module 516. The index search module 516consults an app state record store 520, which may store app recordsconsistent with the format of FIG. 5A. The index search module 516 mayoperate based on text indices, such as inverted indices, and may usemetrics such as TF-IDF (term frequency-inverted document frequency) indetermining matches between a query parse and an app state. Selected appstates are provided to a scoring module 514. Similarly, selected appresults are provided from the app search module 508 to a scoring module514.

The query understanding pipeline 504 may also provide one or more queryparses to a rules-based search module 524. The rules-based search module524 consults a set of rules from a rules store 528. For example, therules store 528 may identify terms relevant to certain verticals. Forexample, the verbs watch, stream, and download may be actions relevantto video content. A rule from the rules store 528 may therefore attemptto find a functional URL that accomplishes the specified function usingan entity from the entity data store 272. For example, if the firsttoken of a query parse is the verb watch, and there is only oneadditional token in the parse, a rule may attempt to match the secondtoken to a movie name in the entity data store 272.

The rules-based search module 524 may have additional context rules. Forexample, if one of the tokens in the query parse is “cheap”, therules-based search module 524 may attempt to find functional URLs forwhich a price can be specified. As another example, when one of thetokens in a query parse is “YELP”, the rules-based search module 524 mayselect a functional URL corresponding to the YELP app where theremaining tokens in the query parse can supply the parameters of thefunctional URL. Even if the remaining tokens do not identify specificentities in the entity data store 272, the rules-based search module 524may perform search forwarding to the dynamic acquisition module 536 sothat the YELP app itself can return results for the remaining tokens.

The query understanding pipeline 504 may attempt to identify which typesof entities are specified in the query. This can be used to selectrelevant rules. For example, if the query wrapper is determined toinclude an entity of type cuisine and an entity of type location, rulesthat require both the cuisine and location will be applicable (such as arestaurant search within a particular app). Additional information aboutidentifying, and searching based on, entity types is provided incommonly assigned U.S. Provisional App. No. 62/220,737 filed on Sep. 18,2015, titled “Entity-Type Search System,” with first-named inventorSudhir Mohan, the entire disclosure of which is incorporated byreference.

For rules that apply to static states, an entity-based retrieval module532 may be provided with the query parse. The entity-based retrievalmodule 532 attempts to identify a corresponding entity in the entitydata store 272 and to form a functional URL for provision to the scoringmodule 514. When the rules-based search module 524 identifies a rulecorresponding to a dynamic state, a dynamic acquisition module 536 maybe provided with the query parse.

In various implementations, in situations where the entity-basedretrieval module 532 is not able to identify a corresponding entity inthe entity data store 272, the entity-based retrieval module 532 maysignal that failure to the rules-based search module 524. Therules-based search module 524 may then convert that query into a dynamicquery and send the dynamic query to the dynamic acquisition module 536.

In other implementations, the rules-based search module 524 attempts toidentify dynamic queries regardless of whether results are returned fromthe entity-based retrieval module 532. In some implementations, theentity-based retrieval module 532 may even be omitted. The rules-basedsearch module 524 may, as described above, identify which rules ortemplates from the rules store 528 can be satisfied with the entitytypes recognized by the query understanding pipeline 504. Rules that canbe instantiated by entity types matching the entities identified by thequery understanding pipeline 504 are provided to the dynamic acquisitionmodule 536.

In various implementations, multiple alternative query parses areprovided to the rules-based search module 524. For example, a singletext string from the query wrapper may correspond to different types ofentities. As just one example, the name of a movie may be in common withthe name of a corresponding book, a remake of the movie, and a videogame based on the remake. This corresponds to three different types ofentities. As a result, three different groups of rules may beinstantiated based on these entity types.

To avoid overwhelming the dynamic acquisition module 536 with dynamicqueries, the rules-based search module 524 may rely on signals from thequery understanding pipeline 504 regarding which entity types are mostlikely. For example, the query understanding pipeline 504 may include,in a simplistic form, a dictionary of entity names along with theirrelative popularities. The rules-based search module 524 may onlyforward rules to the dynamic acquisition module 536 when they correspondto entity types having the highest likelihood from among the potentialentity types. The dynamic acquisition module 536, as described in moredetail below, operates one or more emulator instances 550 to dynamicallyexecute and scrape content from an app. The dynamic acquisition module536 then provides deep view card (DVC) content to a result generationmodule 560.

The scoring module 514 receives functional URLs from the app searchmodule 508, the index search module 516, the rules-based search module524, and the entity-based retrieval module 532. The scoring module 514assigns scores to each result and provides the top-scoring results tothe result generation module 560.

The result generation module 560 includes storage for data indicatingthe reliability of DV content from various apps. The result generationmodule 560 is therefore able to sort deep view content from the dynamicacquisition module 536 and more prominently display deep view contentfrom reliable apps. The result generation module 560 also arbitratesbetween displaying deep view content and scored results from the scoringmodule 514. Once results are chosen from the scored results from thescoring module 514 and the deep view content from the dynamicacquisition module 536, search results are returned to the requestingdevice.

In FIG. 7B, the dynamic acquisition module 536 of FIG. 7A is replacedwith a client API (application programming interface) generation module600. The client API generation module 600 prepares instructions for aclient device to make an API call to the search system and to generateDVC content based on the response to the API call.

In various implementations, the information about the API call and DVCgeneration may be encapsulated in a descriptive format such as JSON(JavaScript Object Notation). In other implementations, the client APIgeneration module 600 generates code, such as JavaScript code, to makean API request and to generate DVC content based on the results. Thiscode can be executed by the client device to make API calls and togenerate DVCs.

In FIG. 8A, an example implementation of the dynamic acquisition module536 of FIG. 7A is shown. A query assembly module 704 receives a queryURL from the rules-based search module 524. A query URL may be partiallyunpopulated. For example, as shown in FIG. 4, the functional URL 304-3does not specify a location ID as the corresponding template 312-3requires. The query assembly module 704 therefore consults the entitydata store 272 to identify relevant entities to populate (or,instantiate) the template. The query assembly module 704 may alsoconvert between analogous entity types. For example, some apps mayrepresent location using latitude-longitude, while others use zip code,and still others use city name.

Based on the dynamic templates, and as demonstrated in FIG. 4, the queryassembly module 704 instantiates the relevant template based onparameter values from the query URL as well as from the entity datastore 272. The completed query URL is provided to a cache controller708. The cache controller 708 determines whether contents related to thequery URL have already been obtained. A cached result store 712 storesdata previously retrieved from apps.

The cache controller 708 may apply various eviction policies, includingfirst-in first-out, and may supplement eviction policies based onvariables such as frequency of access and change rate. In other words,cached results that are accessed more frequently are less likely to beevicted from the cached result store 712. Meanwhile, search results thatappear to change frequently are evicted more quickly from the cachedresult store 712.

If the cached result store 712 already contains the results of a priorscrape of the query URL, then the results are simply passed through to adeep view card (DVC) creation module 716. Otherwise, the cachecontroller 708 provides the query URL to an emulator controller 720. Theemulator controller 720 controls the emulator instances 550. In FIG. 8A,there are N emulator instances 550-1 through 550-N.

Each of the emulators 550 may be operating in a cloud hostingenvironment or in servers owned and/or administered by the operator ofthe search system 100. In some implementations, the emulators 550 areoperated on general purpose computing instances using emulator softwareprovided by the search system 100. In other implementations, theemulators 550 are pre-configured emulator instances provided by a cloudemulation hosting provider.

In emulator 550-1, the host operating system (such as the ANDROIDoperating system or the APPLE IOS operating system) is running. In someimplementations, some operating systems may not have suitable emulators,in which case physical devices may run the operating systems and serveas the emulators 550. For example only, a bank of physical smartphonesmay all be electrically connected (such as via USB) to an interface cardthat is controlled by the emulator controller 720.

Physical devices may be provided via a cloud platform so that thephysical devices can be provisioned as needed. The cloud provider willhave a physical limit to the availability of devices according to thenumber of physical devices that the cloud provider physically possesses.The physical devices may be loaded with a modified copy of a standardoperating system, which may have been modified with additionalaccessibility, control, or user interface scripting features.

In some implementations, simulators may be used. A simulator may beconsidered a species in the emulator genus. Therefore, when the termemulator is used in this disclosure, both full hardware emulators aswell as software simulators are encompassed by the term. Similarly, theterm “device” may be a superset of emulator, and include both physicalhardware devices as well as emulators.

While a full hardware emulator emulates device hardware and allows anexact or lightly-modified copy of an operating system to execute on theemulated hardware, a simulator offers a subset of the features of thefull hardware emulator. For example, the simulator may simulateresponses to API (Application Programming Interface) calls that atypical app would expect to invoke while executing.

In a simulation, the full native operating system running on real oremulated hardware is replaced by a set of libraries that, for most apps,are ideally indistinguishable, as far as an app is aware, from executingon an actual device running that operating system. Limitations ofsimulators may include a lack of sensor inputs (such as accelerometers),an inability to place or receive telephone calls, and the absence ofaccurate GPS (Global Positioning System) data.

Because simulators do not attempt to emulate specific hardware, theyoften require fewer processing and memory resources from the hostoperating system. This allows many more simulators to be run on a serverthan emulators. For example only, compared to hardware emulators, tentimes more simulators may be simultaneously hosted. As described in moredetail below, each simulator may be used for a different app, althoughmultiple simulators may host copies of the same app for apps that aremore popular, more rapidly changing, or that require more time toacquire results.

For example, apps written for the iOS operating system from Apple, Inc.may be loaded into simulators executing on the OS X operating systemfrom Apple, Inc. One instance of the OS X operating system may beconfigured to host multiple simulators. In one example, eight simulatorsare hosted. In fact, a server (such as a server running a variant of theLINUX operating system) may include multiple virtual machines, eachoperating a virtualized instance of the OS X operating system, each OS Xoperating system hosting multiple simulators.

Similar to iOS platform simulation on OS X, some ANDROID operatingsystem apps may be executed using a runtime, such as Android Runtime forChrome (ARC; alternatively, App Runtime for Chrome). An app may bereferred to as an APK, the filename extension for a packaged version ofan app. The APK may be repackaged or recompiled to work in a simulatedenvironment. In some cases, this may be done programmatically withoutmanual intervention of a software program.

The ARC may be integrated with the CHROME operating system and allows anapp to run in a sandbox (a restricted memory area segregated from therest of the operating system). In other implementations, the ARC may beimplemented as an extension to the CHROME browser. For example, animplementation of the ARC framework named ARChon may allow an Androidapp to execute within a sandbox in a Chrome browser on operating systemsother than Chrome OS, such as the Windows operating system, the Linuxoperating system, and the OS X operating system.

In the emulator 550-1, one or more executing apps including an executingapp 724-1 are operating in the emulated operating system. A managementagent 728-1 determines which apps are installed in the emulator 550-1 aswell as which apps are currently executing. A context, navigation, andscraping agent 732-1 indicates to the management agent 728-1 that an appneeds to be downloaded, installed, and/or executed to service a commandfrom the emulator controller 720.

Generally, the emulator controller 720 will send commands to an emulatoralready possessing, and ideally already executing, the desired app. Anassignment module 736 of the dynamic acquisition module 536 tracks whichapps are executing on which of the emulators 550. Based on load, theassignment module 736 may allocate more or fewer of the emulators 550for certain apps. The emulator controller 720 may consult the assignmentmodule 736 to determine which of the emulators 550 a scraping commandshould be sent to.

Returning to the emulator 550-1, the agent 732-1 provides context to theexecuting app 724-1, such as by forcing location services of theoperating system to indicate that the emulator 550-1 is actuallyoperating at the same location as the user device having sent the searchquery. The agent 732-1 may also set a time of the emulator 550 to matcha time zone of the requesting user device.

The agent 732-1 focuses the executing app 724-1 and navigates to adesired state according to a predetermined breadcrumb trail. In variousimplementations, opening and navigating may be performed with a singleAPI (Application Programming Interface) call, which may be referred toin some circumstances as an intent. The navigation agent may also injectuser interface events to the executing app 724-1 to navigate from onestate to another.

Once a desired state is reached, such as a search results state for acommanded search, the agent 732-1 scrapes text, images, and metadatafrom the state of the executing app 724-1. In various implementations,the agent 732-1 may also extract other types of media, such as audio, orvideo. The scraped content is returned to the emulator controller 720,which provides the results and the metadata to the cache controller 708.

The cache controller 708 forwards the data to the DVC creation module716 and optionally stores the retrieved data into the cached resultstore 712. The DVC creation module 716 consults a DVC template store 740to determine which data from the search results to incorporate intoDVCs. In addition, the DVC template store 740 may include the layout andformatting of the data for creating DVCs. The DVC creation module 716outputs completed DVCs to the result generation module 560, such as inFIG. 7A.

In FIG. 8B, an alternative implementation corresponding to the clientAPI generation module 600 of FIG. 7B is shown. For various reasons, suchas latency, it may be desirable for the client to make a request for DVCcontent to be scraped and returned directed to the client than for thatcontent to be acquired at the search system and then provided to theclient. Also, the client may be able to provide additional, or moreaccurate, context within an API call, such as exact location, sensordata, screen resolution, installed apps, and active registered accounts.

Therefore, the client API generation module 600 may provide apre-formatted API call as well as instructions for DVC generation to theclient. Client code 800, which may have been received in a softwaredevelopment kit (SDK) provided by the search system 100, or may beincluded in a standalone search app, receives results from the searchsystem 100. Some of the results may have been generated by the clientAPI generation module 600 and therefore have one or more API calls,which are provided to a search system API call module 804, and deep viewcard (DVC) instructions, which are provided to a DVC creation module808. All of the other results are provided directly to a result displaymodule 812.

The search system API call module 804 selectively sends an API call to asearch system API service 840, which responds with DVC content. Invarious implementations, a local cache 816 may store DVC content, whichwould eliminate the need to make an API call for content that is cached.Further, the search system API call module 804 may wait to send an APIcall until it appears that the user will actually view the resulting DVCcontent. As one example, API calls for results that would not appear inthe first screen of results may not be sent immediately. In addition,the search system API call module 804 may recognize that some API callsmay not be relevant to a user of the device, such as if the API callrelates to an app the user has indicated a dislike for.

The client API generation module 600 includes the query assembly module704, which may be similar to that shown in FIG. 8A. A DVC parsing anddisplay instructions module 844 consults the DVC template store 740 todetermine how to parse returned content and generate DVCs for the querygenerated by the query assembly module 704. These instructions areprovided as results transmitted to the client code 800 via the resultgeneration module 560.

To mitigate delay between the query URL generation and the later APIcall by the client code 800, a preload alert may be generated inresponse to the query URL and provided to a network communication module860 of the search system API service 840. The assignment module 736 mayensure that at least one app related to the expected API request isinstalled on one of the emulators 550. In addition, the emulatorcontroller 720 may begin to speculatively navigate to the desired stateof the app so that the expected API call can be serviced promptly.

The network communication module 860 receives the API call from thesearch system API call module 804 of the client code 800 and providesthe query URL to the cache controller 708, which may be similar to thecache controller 708 of FIG. 8A. When the cache controller 708 returnsresults, either from the cached result store 712 or from the emulatorcontroller 720, the network communication module 860 provides thecontent to the search system API call module 804. The search system APIcall module 804 provides the content to the DVC creation module 808.

Based on DVC instructions related to parsing and display, the DVCcreation module 808 generates, formats, and scales (according to thescreen size and resolution of the device) DVCs. These DVCs are providedto the result display module 812, which combines the DVCs from the DVCcreation module 808 with any other results provided by the search system100.

In FIG. 9, another example implementation of a search system API service900 is shown. The search system API service 900 includes a director 904with a query receiver 908 that receives queries. For example, queriesmay be received over a network connection from a user device, such as auser device programmed with the client code 800 of FIG. 8B.

A query router 912 of the director 904 determines whether the query canbe satisfied from cached results by consulting the cache manager 920. Ifthe query cannot be satisfied by cached results, if caching is notenabled in the search system API service 900, or if the cached resultsare out of date, the query router 912 forwards the query to one or moreof a web scrape system 924, an API passthrough 928, and a live scrapesystem 932.

Results that are substantially out of date may have already been deletedby the cache manager 920. Meanwhile, cache results that are onlyslightly out of date may be verified by passing the query to the webscrape system 924, the API passthrough 928, and/or the live scrapesystem 932. Results that are being verified may immediately be returnedfor presentation to a user. Results that have changed or been deletedcan be updated in a further message to the user device. This increasesresponsiveness and may not even be discernible to a user of the userdevice. Results that are frequently found to have changed duringverification indicate that the app updates more frequently than thecache is updated. The data should therefore be evicted from the cache ormarked as stale more quickly.

Results from one of these sources or from a combination of the sourcesare provided from the query router 912 to a content provider 936. Thecontent provider 936 provides results of the query back to a client. Forexample only, results of the query may be in the form of one or moreDVCs. A DVC builder 940 may construct DVCs from data obtained by thecontent provider 936. The DVC template store 740 may determine what datais relevant for the DVC and may specify how to format the DVC.

In various implementations, the content provider 936 receives someresults already in a DVC format and receives some results with raw datanot already formatted as a DVC. Results that are already in the form ofDVCs may in some implementations still be provided to the DVC builder940, which may apply a different template, as specified by the DVCtemplate store 740. Different DVC templates may be used for differenttypes of user devices. For example, a different size or style of DVCtemplates may be specified for a tablet as compared to a phone.

A query frequency analyzer 944 analyzes incoming queries from the queryreceiver 908 and identifies queries that are received frequently enoughthat those queries should be cached. A determination of whether to cachethe results of the query is indicated to the cache manager 920. In someimplementations, all results are cached until the cache manager 920 runsout of storage space allocated for query result storage. In suchimplementations, the query frequency analyzer 944 may help the cachemanager 920 determine which results to evict first from storage in orderto make room for new results.

Without a query frequency analysis, the cache manager 920 may simplyevict the oldest results in a first in, first out scheme. The queryfrequency analyzer 944 may have a higher threshold that determines whena query is popular enough that the results for that query should bepre-fetched in anticipation of further receipt of such a query. Thesequeries may be rerun on a periodic basis so that results are fresh.Further, these queries of interest may be precluded from eviction fromcache storage.

The cache manager 920 may determine how frequently the results changefor these queries of interest and adjust the periodic retrieval rateaccordingly. In addition, some queries remain static during certainconditions, such as certain times of the day or days of the week. Forexample, live public transit data may not change from a time when thelast train stops running at midnight until the next train begins at 5:00a.m.

The cache manager 920 may therefore define conditions during whichcertain queries are not rerun, thereby decreasing the resources of boththe search system API service 900 and the source of the data. The cachemanager 920 provides queries of interest to a cache builder 948. Thecache builder 948 then requests results from one or more of the webscrape system 924, the API passthrough 928, and the live scrape system932. The cache builder 948 may have access to the same routinginformation as the query router 912, or may use the query router 912 toselect an appropriate system for a query.

A cache initializer 952 may have a set of pre-stored queries that areprovided to the cache builder 948 prior to the query frequency analyzer944 identifying queries of interest. The cache initializer 952 may bepopulated by an operator of the search system API service 900, who mayhave empirical data and educated guesses about which queries will provepopular.

The web scrape system 924 may implement a traditional web scrapingalgorithm or may be glue logic for interfacing with a traditional webscraper. In various implementations, a functional URL received from thequery router may be transformed into a web URL, and the web scrapesystem 924 then accesses the web URL using HTTP (hypertext transferprotocol). Content, such as an HTML (hypertext markup language) isreturned by the web server identified by the web URL and scraped.

The API passthrough 928 passes an API call through to a third-party appthat publicly exposes an API. The API passthrough 928 may include oraccess a data store with records indicating how to transform afunctional URL into an API call recognized by the third-party API. Invarious implementations, the third-party API may not be documented butis publicly accessible. If the undocumented third-party API ceases towork as expected or is blocked, the query router 912 may direct requestsfor that API instead to the app itself in the live scrape system 932.

In FIG. 10, an example implementation of the live scrape system 932 ispresented. A master controller 1004 receives queries, such as from thequery router 912 or the cache builder 948 of FIG. 9. The mastercontroller 1004 controls a set of emulation servers 1008 via a servercontroller 1012. As described in FIG. 11, each server of the set ofemulation servers 1008 includes multiple emulator instances.

The server controller 1012 initializes each server of the set ofemulation servers 1008, starts and stops (including, in variousimplementations, powering up and powering down) each server of the setof emulation servers 1008, and manages the health of each of theemulation servers 1008. Servers may be started and stopped depending onload of the live scrape system 932.

In a virtualized hosting environment, the server controller 1012 maystart additional server instances, thereby growing or shrinking the sizeof the set of emulation servers 1008, according to demand, eithercurrent or expected. The server controller 1012 may recognize patternsover time, such as whether days of the work week have a higher demandthan do days of the weekend. In addition, the server controller 1012 mayrecognize the pattern that a higher demand is present in the eveningsthan compared to the morning or afternoon.

The server controller 1012 may monitor a load on the live scrape system932, which may be measured as a percentage of total capacity, and maystart or power on servers from the set of emulation servers 1008 whenthe utilization rises above a first threshold. In some implementations,a second, lower threshold may be used by the server controller 1012 todetermine when to power down, idle, or stop servers in the set ofemulation servers 1008. This hysteresis prevents brief lulls in activityfrom causing servers to be stopped that then quickly need to berestarted. Hysteresis may also be created by waiting for a predeterminedamount of time with utilization below the second threshold beforestopping servers. To prevent instantaneous spikes in activity fromcausing more servers to be started, a time delay may be applied to thefirst threshold as well. In other implementations, a moving average maysmooth brief peaks and troughs in utilization.

A device controller 1016 interfaces with the devices on each of the setof emulation servers 1008. The device controller 1016 initializes thedevice with an operating system and optionally one or more default apps.Although the present disclosure will be described in terms of emulators,the principles of the present disclosure also apply to instances wheresimulators or physical devices are used instead of or in addition toemulators.

The device controller 1016 maintains an open session between each of theemulators in the set of emulation servers 1008 and the master controller1004. The device controller 1016 also establishes which apps will beinstalled on which emulators and which app is in the foreground. Inaddition, the device controller 1016 may determine which apps arerunning on any each emulator.

When simulators are used, the device controller 1016 may only be able toassign a single app to the simulator. In various implementations, theserver controller 1012 may start some servers that host hardwareemulations and some servers that host simulations. In implementationswhere the operating system of the server cannot directly host anemulator instance, the server may instead have virtualized environmentsthat support an operating system that can host the emulator. As oneexample, which may conjure images of a stack of turtles, a cloudprovider such as Amazon Web Services (AWS) may provide virtualizedservers.

For example only, AWS hosts a virtual server running a variant of theLinux operating system. This virtual server may run a hypervisor orother controller for a virtual machine that runs the OS X operatingsystem. The virtualized OS X operating system provides a simulator thatsimulates the APIs of an iOS operating system device for the benefit ofan app running in the simulator. Subject to licensing restrictions, eachvirtualized server provided by AWS can theoretically host multipleinstances of a virtualized OS X operating system, and each of thevirtualized OS X operating system instances can execute multiplesimulators, each simulator hosting one app.

In fact, some simulators may allow for multiple apps to be simulated,which may be helpful for developers to simulate interactions betweenapps. The server controller 1012 or the device controller 1016 maycontrol which app or apps each simulator hosts and may stop or modify asimulator to replace the app being hosted.

The device controller 1016 may also assign network addresses to eachemulator using a proxy server 1020. For example, the device controller1016 may select from a pool of available IP addresses. In someimplementations, this pool of available IP addresses may be associatedwith different geolocation values. As a result, when the emulatorcontacts a service that infers geolocation from IP address, the emulatorcan appear to be located in that geographic region. In otherimplementations, the device controller 1016 may simply assign IPaddresses to the emulators in a round-robin manner. The proxy server1020 interfaces between each of the emulators and the Internet.

A request rate manager 1024 receives each request from the mastercontroller 1004 and applies a rate limit per app. For example, a givenapp may accept no more than a predetermined number of requests per houror per day. Once this limit is exceeded, the request rate manager 1024may indicate that further requests for data from that app cannot besatisfied. If the request does not exceed the threshold, the requestrate manager 1024 provides the request to a specified one of the set ofemulation servers 1008.

The master controller 1004 queues requests received from the queryrouter 912 and the cache builder 948 of FIG. 9. The master controller1004 may give priority to requests from the query router 912, whileresults from the cache builder 948 receive a lower priority, as they maybe simply updating previous results in anticipation of future queries.The master controller 1004 may instruct the server controller 1012regarding how many servers to start up as part of the set of emulationservers 1008. The master controller 1004 may instruct the devicecontroller 1016 which apps to install and execute on each emulator inthe set of emulation servers 1008.

In FIG. 11A, an example implementation of an emulation server 1100 (suchas one of the set of emulation servers 1008) includes a device manager1104 that interfaces between emulators 1108-1, 1108-2, . . . 1108-N(collectively, emulators 1108) and the request rate manager 1024 of FIG.10.

Requests from the request rate manager 1024 are provided to a specifiedone of the emulators 1108. Results are returned to an output buffer 1028of the live scrape system 932 via the device manager 1104. In responseto the device controller 1016 of FIG. 10, the device manager 1104provides instructions to the emulators 1108 regarding which apps toinstall and to run.

In the emulator 1108-1, a device agent 1120 communicates with the devicemanager 1104. The device agent 1120 may be an installed app but withgreater privileges than a regular app. For example, the device agent1120 may be installed as a launcher app in the emulator 1108-1. In otherimplementations, the device agent 1120 may be incorporated into anoperating system, such as the ANDROID operating system, used in theemulator 1108-1.

App storage 1124 allows for storage of apps that may be executed inorder to return results for a query. The apps stored in app storage 1124may be pre-loaded when the emulator 1108-1 is instantiated or may beadded by request of the device manager 1104. The app storage 1124 may bestandard storage for apps provided by the operating system or may be aseparate repository.

The device agent 1120 uses a user interface (UI) event replay system1126 to navigate to a specific state of an app from which information isto be extracted. A content scraper 1128 extracts content from thespecific state. For example, the content may include text, images, andmetadata (which may include explicit metadata, such as image tags, ormay be inferred from location or formatting of text or images).

The content scraper 1128 may extract a UI tree of elements from thestate and encode that tree using XML (Extensible Markup Language). Thecontent scraper 1128 may also capture screenshots of a state. In someimplementations, instead of taking a screenshot of an entire display,individual screenshots may be taken, with each screenshot correspondingto a search result. Each of these screenshots may correspond to or betransformed into deep views (DVs).

In some implementations, additional data about a result is desired forcreating a DV. In such situations, the search result is not simplyscraped but is also followed. When traversing to a new state from asearch result, the resulting state may provide additional information.For example only, in a restaurant review app, traversing a search resultfor a restaurant may lead to additional details about the restaurant,such as hours of operation. A combination of screenshots and textualdata (including structured data, such as XML) can be provided as aresult of a requested query.

The device agent 1120 controls which apps are executing, which apps areinstalled, and which app is in the foreground. In variousimplementations, bringing an app to the foreground or installing an appmay be performed using the UI event replay system 1126. In otherimplementations, installing an app may be performed by providing aninstall instruction to a built-in app retrieval mechanism such as theGOOGLE PLAY digital distribution platform from Google, Inc. or the APPSTORE digital distribution platform from Apple Inc.

The device agent 1120 may also be able to fire (that is, call) methods(also referred to as intents) that direct an app to transition to aparticular state. When available, firing a method can be performedinstead of navigating using simulated UI events from the UI event replaysystem 1126. In other situations, such as where a method is notavailable to directly arrive at a certain state, a method may be firedto perform some of the navigation, followed by the UI event replaysystem 1126 completing the navigation.

The deep link recorder 1132 may be included to record paths to certainstates for later replay to reach those states once again. A linkextractor 1136 may be used to identify other states of interest thatshould be crawled either by a crawling system or within the emulator1108-1. The emulator 1108-2 and the remaining emulators 1108 may operatesimilarly to the emulator 1108-1.

In FIG. 11B, an emulation server 1150 may serve as one of the set ofemulation servers 1008. The emulation server 1150 includes a devicemanager 1154 that controls simulators 1158-1, 1158-2, . . . 1158-N(collectively, simulators 1158). The device manager 1154 may receive arequest from the request rate manager 1024 of FIG. 10 and provideresults to the output buffer 1028 of FIG. 10. The simulators 1158 may beincluded as part of an operating system of the emulation server 1150. Insome implementations, the simulators 1158 may be modified, such as witha software package called Statebit. The simulators 1158, as specialcases of emulators, may only simulate some of the functionality of auser device and they only allow for a single app to be simulated at atime.

The device manager 1154 includes an app loader 1162 that controls whichof the simulators 1158 are responsible for hosting which apps. Packagescontaining the apps may be stored in app storage 1166. The devicemanager 1154 may include a content scraper 1170, a link extractor 1174,a UI event replay system 1178, and a deep link recorder 1182, which maybe similar to items having the same name in FIG. 11A. The UI eventreplay system 1178 may use an accessibility framework or a scriptingframework of an operating system executing on the emulation server 1150.Similarly, the content scraper 1170 may extract data from one of thesimulators 1158 using screenshot and UI enumeration capability builtinto the operating system for accessibility, scripting, and developmentpurposes.

In FIG. 12, an example implementation of the cache manager 920 of FIG. 9is shown. Retrieved data from a query is provided to a scraper 1204. Thescraper 1204 takes raw data captured from a state, such as screenshotimages, and extracts data of interest. For example, the scraper 1204 mayassign tags to various pieces of data, may map a UI tree from the data,etc.

The raw data may include, for example, a list of all UI elements andcorresponding resource IDs, and any corresponding text and images,including metadata related to formatting, such as placement andresolution of the images and font formatting of the text. The scraper1204 may not be necessary when scraping, such as within an emulator, hasalready transformed raw data into scraped data.

In some implementations, a DVC builder 1208 adapts the retrieved datainto a DVC format. The DVC builder 1208 may not be necessary if DVCcontent is directly received. Otherwise, the adaptation performed by theDVC builder 1208 may be dictated by a template from the DVC templatestore 740. In various implementations, the DVC template store 740 mayinstead be a template store specific to the cache manager 920. Forexample, DVCs stored within the cache manager 920 may adopt a genericformat while DVCs provided to a device may have a format that is morespecific to that device. In various implementations, the DVC templatesused in the cache manager include a superset of all data used incorresponding device-specific templates. In other words, all potentiallyrelevant data is cached, and some or all of it is retrieved depending onthe DVC format in which the data will be output.

A storage interface 1212 maintains an index of what results are storedin a results data store 1216. The storage interface 1212 identifieswhether results for the query are stored in the results data store 1216and indicates that fact to a requester. The storage interface 1212 mayinclude lookup tables, hash tables, etc.

In addition, the storage interface 1212 may indicate how fresh the datais. An eviction engine 1220 determines which data to remove from theresults data store 1216 to allow for storage of more recent or morepopular data. Further, the eviction engine 1220 may remove stale data toreduce the storage burden, which in a cloud-based environment may becharged by storage size.

A freshness monitor 1224 communicates with the query frequency analyzer944 and determines how frequently to update the results for queries ofinterest. When a query of interest has reached the time at which theresults should be updated, the freshness monitor 1224 sends the query tothe cache builder 948.

The freshness monitor 1224 may indicate to the eviction engine 1220which query results are of interest and should therefore not be removedas readily. In addition, the freshness monitor 1224 may recognize howfrequently data is updated. If query results change each time a query isrun, the freshness monitor 1224 may run a query more and more frequentlyuntil the data does not change between every pair of requests. In thisway, the freshness monitor 1224 may identify an expected validitylifetime for query results.

When the freshness monitor 1224 recognizes that query results have beenpresent longer than this expected validity time, the freshness monitor1224 may provide that query to the cache builder 948 to refresh thedata. If the validity time is too short, the freshness monitor 1224 mayrecognize that attempting to cache the data is impractical and thereforeinstruct the eviction engine 1220 to delete the corresponding resultsand instruct the storage interface 1212 to always respond that thoseresults are not cached. Further, the scraper 1204 is instructed not tofurther process those results. For results that do not have individualfreshness information, the eviction engine 1220 may remove the resultsstored in the results data store 1216 for the longest period of time tomake way for new results.

In FIG. 13, the device 104 is shown communicating with the search system100, which in turn communicates with the emulator instances 550. Theemulator instances 550 may receive an app (“App A” in this example) froma digital distribution platform 1300. The digital distribution platform1300 provides apps, and may be specific to an operating system or mayprovide apps for multiple operating systems. For example only, thedigital distribution platform 1300 may include the GOOGLE PLAY digitaldistribution platform, the APPLE APP STORE digital distributionplatform, and the WINDOWS PHONE digital distribution platform.

Based on search results related to App A provided by the search system100, the device 104 may wish to acquire “App A” from the digitaldistribution platform 1300 in order to interact further with “App A.”The digital distribution platform 1300 can then supply “App A” to thedevice 104. Although the solid arrows in this app indicate functionaldata exchange, some or all communication may be by way of network 1304.The network 1304 may include local area networks, personal areanetworks, and wide area networks, such as the Internet.

Flowchart

In FIG. 14, a flowchart depicts example operation of an intake processfor apps into the search system 100. Control begins at 1404, where if anew app is available for the intake process (referred to as onboarding),control transfers to 1408. Otherwise, control remains at 1404.

At 1408, control determines whether a search function is available inthe app. If so, control transfers to 1412; otherwise, control transfersto 1416. Examples of search functions are searching for certain types ofestablishment within a certain geographical area, searching for reviewsof a particular movie, and searching for trivia about a certain event.At 1416, the app is crawled, scraped, and indexed as usual, creating appstate records, because search functions are not available to access thatcontent. Control continues at 1404.

At 1412, control selects the first search function that is available inthe app. At 1420, control determines a breadcrumb trail to the selectedsearch function. The breadcrumb trail is one or more API calls (such asintents) and user interface interactions that will navigate to the statewhere the search function can be actuated. At 1424, the search functionis parameterized. In other words, a template is created includingsnippets of text and parameters, where the parameters can be replacedwith various types of data in order to instantiate the template.

As part of parameterizing the search function, the human operator orautomated onboarding system determines which entity type is required foreach parameter. For example, a restaurant search function may allow fora specification of a cuisine, meaning that the entity type is cuisine. Amovie information search function may take as input the name of a movieor the name of an actor. In this scenario, the search function may beannotated as having two different entity types. In otherimplementations, two search functions may be defined, where one searchfunction requires a movie type of entity and the other search functionrequires an actor name type of entity.

At 1428, control identifies whether any of the parameters requireenumerated entities. For example, one of the parameters may be a city,and the operator therefore identifies, such as by pointing and clicking,that the parameter will be satisfied by providing a text or numericalvalue for a city. These may be generic entities of a particular type(for example, city, zip code, movie, year, etc.). In another example,the enumerated entities may be a list of location identifiers unique tothe app. These enumerated entities may be extracted from the app, suchas by using a special-purpose crawler. Other parameters may allow forplain text and therefore be identified as plain text parameters notrequiring one of a set of enumerated entities.

At 1432, control evaluates the search results generated by the searchfunction and identifies access URLs to arrive at these search results.In other words, once a user of a user device decides to select one ofthe deep view cards, the corresponding app is opened to thecorresponding state. The access URL template allows this to occur. Invarious implementations, the access URL template may be anhttp://namespace template, which will open in an app if the app ispresent or will open in a web browser if the app is not present.

The access URL templates may be parameterized such that for each set ofsearch results, visible information or hidden metadata will be used todetermine an access URL that allows navigation to the search result inthe app.

At 1436, user interface elements of the search results are tagged fordeep view card creation. This tagging of user interface elements may beperformed using a point and click interface by an operator of the searchsystem 100. For example, the operator may identify a text field withinthe search results as a title of a search result and identify a graphicas a representative image for the search result. Further, the operatormay tag a graphical representation of a number of stars as a rating forthe result. These tagged items can then be arranged as desired togenerate any DVC requiring these items.

At 1440, control determines whether there are additional searchfunctions to analyze in the app. If so, control transfers to 1444;otherwise, control continues at 1448. At 1444, control selects the nextsearch function and continues at 1420. The breadcrumb trail determinedat 1420 is used by an emulator to reach the search functions so that thesearch function can be actuated. Meanwhile, the access URL determined at1432 is used by the user device to navigate to a desired statecorresponding to a selected search result. At 1448, control determineswhether there is relevant unsearchable content remaining within the app.If so, control transfers to 1416; otherwise, control returns to 1404.

In FIG. 15, query routing control waits at 1504 until a query isreceived. The query specifies a specific app, a specific search functionof the app, and parameters to supply to that search function. When aquery is received, control continues at 1508. If caching is enabled,control transfers to 1512; otherwise, control transfers to 1516. Ifcaching is not even implemented, 1508 can be omitted entirely. At 1512,control determines whether results for the specified query are cached.If so, control transfers to 1520; otherwise, control continues at 1516.

At 1520, control determines whether the cached results are fresh—thatis, whether the results have been stored in the cache for less than anamount of time within which those results have historically beenupdated. If so, control transfers to 1524; otherwise, control continuesat 1516. In other words, results being fresh mean that they are unlikelyto have changed since being cached or at least that the confidence inthe results still being accurate is above a certain threshold.Therefore, at 1524, control returns the cached results.

Control may then return to 1504 or continue at optional element 1528. At1528, control may verify that the cached results are still accurate.This may involve reacquiring the results as described in more detailbelow. At 1532, if there had been updates to the cached results, controltransfers to 1536; otherwise, control returns to 1504. At 1536, controlreturns the updates to the same destination as the cached results weresent. The cached results can then be updated—in some cases, before theresults have even been surfaced to a user. Control then returns to 1504.

At 1516, control determines whether an API has been exposed for thequery. The query specifies an app and a desired function. If the appdeveloper has exposed (whether documented or not) an API and the searchsystem is able to call, control continues at 1540; otherwise, controltransfers to 1544. At 1540, control determines parameters for the APIcall based on the query. At 1548, control calls the API with thedetermined parameters. At 1552, control parses returned results and, at1556, the parsed results are returned as query results. Control thenreturns to 1504.

At 1544, control determines whether a web edition of the app specifiedin the query is available. If so, control transfers to 1560; otherwise,control transfers to 1562. At 1560, control generates a URL (UniformResource Locator) based on the query and a URL template corresponding tothe app and the function of interest. At 1564, control retrieves webcontent from the server specified by the URL. Control then continues at1552.

At 1562, control identifies whether any emulators currently have the appinstalled. At 1566, if any emulators have been identified, controlcontinues at 1568; otherwise, control transfers to 1570. At 1570 controlselects an emulator for app installation. This process may be slow (insome cases, on the order of seconds), so the search system API servicemay send an interim message indicating that results will be delayed.This message may cause the delayed results to be removed from theresults returned to the user device or may cause them to be demoted to asecond page or off screen location. This gives additional time for theresults to be obtained before a user may attempt to view them. Afterinstalling the app on the emulator, control continues at 1568.

At 1568, if multiple emulators have been identified that each have theapp installed, control transfers to 1572; otherwise, control transfersto 1574. At 1572, control determines whether any of the identifiedemulators already have the app in the foreground and at the relevantstate. If so, control transfers to 1576; otherwise, control transfers to1578. At 1576, control ignores (that is, removes from the identifiedset) emulators where the app is not already in the foreground at therelevant state. Control then continues at 1578.

At 1578, an emulator is selected from the identified list. The emulatormay be selected based on how popular other apps on that emulator are. Itmay be ideal to select an emulator that can service the request withvery little at risk that another app installed in the emulator will beneeded before the emulator can return results. Control continues at1574.

At 1574, control navigates to the relevant state in the selectedemulator as necessary. If the emulator is already at the relevant state,control proceeds immediately to 1580. Navigating to the relevant statemay be performed by a combination of method calls and/or user interfaceevent injection. At 1580, control performs a search based on thereceived query. For example, the received query may specify a textstring to supply to the search state of the app and may also specify oneor more other options (such as checkboxes or radio buttons) to specify.

At 1582, control scrapes the search results state. This may includescraping the visible results from the first state as well as scrollingor transitioning to a subsequent state that offers additional searchresults. As described above, scraping may include extracting text andmetadata as well as taking screenshots, such as screenshots ofindividual ones of the search results.

At 1584, control optionally actuates one or more of the search resultsand scrapes data from the resulting state. Control then continues at1556. For some apps, additional information may only be available byselecting the search result and going to the specific state for thatsearch result. For example, a search result list of restaurants mayinclude information such as location, rating, and a representativeimage, but the further state may provide additional data of interest tothe search system, such as hours of operation.

In FIG. 16, example operation of a caching system, such as that shown inFIGS. 9 and 12, begins at 1604. Predefined queries are established by anoperator of the search system based on an expectation of the frequencyof various queries from various locations. For example, searches forrestaurants may be extremely common for fifty different zip codes, whilerestaurant searches from the remaining zip codes in a country may bemore sporadic. For movies, frequent searches may be expected forupcoming blockbuster movies as well as popular movies already intheaters.

The operator may seed the cache system with these expected queries sothe cache system will be ready with results, decreasing latency for themost popular search results. At 1604, when operation of the cache systembegins, these initial queries are loaded from the predefinedinitialization data and, at 1608, the initial queries are added to acache builder. The cache builder maintains a queue of queries to run andobtain results for caching.

At 1612, the main operation of the cache system begins. If a query hasbeen received, control transfers to 1616; otherwise, control transfersto 1620. At 1616, control records the received query in a queryfrequency analyzer data store. At 1624, control returns any results thatare cached in the data store corresponding to the query. Control thencontinues at 1620.

At 1620, control analyzes query frequency based on recorded informationin the query frequency analyzer. This analysis may include calculatingstatistics over time and normalizing how frequent a query is withrespect to other queries. The statistical analysis of the queryfrequency may include a moving average filter so that queries that wereonce frequent but have become rare will gradually decrease in theiranalyzed frequency.

Query frequency may be measured at different levels of granularity. Forexample, the query frequency may be analyzed at the level of which appand function are specified in the query. Additionally or alternatively,more granular analysis may be performed, such as on specific valuesprovided to the app for that function. At 1628, control determineswhether there are any queries whose normalized frequency exceeds athreshold. If so, control transfers to 1632; otherwise, controltransfers to 1636.

At 1632, control flags any query that exceeds the threshold as a commonquery. This will cause the query to be refreshed so that up-to-date datais generally present in the cache. Control then continues at 1636. At1636, control determines whether fresh results are not cached for anycommon query. If so, control transfers to 1640; otherwise, controltransfers to 1644. At 1640, control adds any common query for whichup-to-date results are not cached to the cache builder. Control thencontinues at 1644.

At 1644, control determines whether the cache builder has an emptyqueue. If so, control continues at 1648; otherwise, control transfers to1652. At 1652, control routes an entry from the cache builder to a datasource. For example, routing decisions such as those shown in FIG. 15may determine whether the entry is routed to systems including the webscrape system 924, the API passthrough 928, and the live scrape system932 of FIG. 9. Control then continues at 1648.

At 1648, if results have been received, such as from an entry submittedby the cache builder at 1652 or from a query requested by a user,control transfers to 1656; otherwise, control returns to 1612. At 1656,control updates freshness statistics related to the results. Forexample, control may compare the timestamp of prior results cached forthe same search to determine whether the results have changed. Ifresults for a certain query frequently change over the course of a timeperiod, it is likely that the useful time for keeping those resultsaround is less than this time period.

As the time period between acquiring varying results decreases, theremay be a lower threshold below which the cache system will not updatecache results. If results to a query frequently change faster than thisminimum interval, the cache system may determine that caching the queryis not useful. One example implementation of this may be to set theexpiration of these search results to zero indicating that they shouldnot be cached.

At 1660, control determines whether the cache is full. Thisdetermination may be made based on whether a proportion of an allocatedspace for the cache results that is already in use exceeds a thresholdpercentage. If the cache is full, control transfers to 1664; otherwise,control transfers to 1668. At 1664, control evicts entries from thecache. The eviction decision may be based at least partially onfreshness. Cache entries for results that are past their expected usefullifetime can be removed from the cache without substantial risk.

In fact, entries may be evicted based on freshness even when the cacheis not considered to be full. This approach may be especially helpfulwhen there is not a specified amount of storage space reserved for thecache. Minimizing the size of the cache is beneficial because storagemay be inexpensive but is rarely free. Control continues at 1668.

At 1668, the received results are selectively cached. For example,results corresponding to a query with a very short expiration(indicating that attempting to cache results is fruitless) are notcached, but remaining results are. In various implementations, resultsare only cached once the normalized query frequency for thecorresponding query increases above a threshold. This threshold may belower than the threshold at 1628 that will cause a query to beconsidered common. Control then returns to 1612.

CONCLUSION

The foregoing description is merely illustrative in nature and is in noway intended to limit the disclosure, its application, or uses. Thebroad teachings of the disclosure can be implemented in a variety offorms. Therefore, while this disclosure includes particular examples,the true scope of the disclosure should not be so limited since othermodifications will become apparent upon a study of the drawings, thespecification, and the following claims. It should be understood thatone or more steps within a method may be executed in different order (orconcurrently) without altering the principles of the present disclosure.Further, although each of the embodiments is described above as havingcertain features, any one or more of those features described withrespect to any embodiment of the disclosure can be implemented in and/orcombined with features of any of the other embodiments, even if thatcombination is not explicitly described. In other words, the describedembodiments are not mutually exclusive, and permutations of one or moreembodiments with one another remain within the scope of this disclosure.

Spatial and functional relationships between elements (for example,between modules) are described using various terms, including“connected,” “engaged,” “interfaced,” and “coupled.” Unless explicitlydescribed as being “direct,” when a relationship between first andsecond elements is described in the above disclosure, that relationshipencompasses a direct relationship where no other intervening elementsare present between the first and second elements, and also an indirectrelationship where one or more intervening elements are present (eitherspatially or functionally) between the first and second elements. Asused herein, the phrase at least one of A, B, and C should be construedto mean a logical (A OR B OR C), using a non-exclusive logical OR, andshould not be construed to mean “at least one of A, at least one of B,and at least one of C.”

In this application, including the definitions below, the term ‘module’or the term ‘controller’ may be replaced with the term ‘circuit.’ Theterm ‘module’ may refer to, be part of, or include processor hardware(shared, dedicated, or group) that executes code and memory hardware(shared, dedicated, or group) that stores code executed by the processorhardware.

The module may include one or more interface circuits. In some examples,the interface circuits may include wired or wireless interfaces that areconnected to a local area network (LAN), the Internet, a wide areanetwork (WAN), or combinations thereof. The functionality of any givenmodule of the present disclosure may be distributed among multiplemodules that are connected via interface circuits. For example, multiplemodules may allow load balancing. In a further example, a server (alsoknown as remote, or cloud) module may accomplish some functionality onbehalf of a client module.

The term code, as used above, may include software, firmware, and/ormicrocode, and may refer to programs, routines, functions, classes, datastructures, and/or objects. Shared processor hardware encompasses asingle microprocessor that executes some or all code from multiplemodules. Group processor hardware encompasses a microprocessor that, incombination with additional microprocessors, executes some or all codefrom one or more modules. References to multiple microprocessorsencompass multiple microprocessors on discrete dies, multiplemicroprocessors on a single die, multiple cores of a singlemicroprocessor, multiple threads of a single microprocessor, or acombination of the above.

Shared memory hardware encompasses a single memory device that storessome or all code from multiple modules. Group memory hardwareencompasses a memory device that, in combination with other memorydevices, stores some or all code from one or more modules.

The term memory hardware is a subset of the term computer-readablemedium. The term computer-readable medium, as used herein, does notencompass transitory electrical or electromagnetic signals propagatingthrough a medium (such as on a carrier wave); the term computer-readablemedium is therefore considered tangible and non-transitory. Non-limitingexamples of a non-transitory computer-readable medium are nonvolatilememory devices (such as a flash memory device, an erasable programmableread-only memory device, or a mask read-only memory device), volatilememory devices (such as a static random access memory device or adynamic random access memory device), magnetic storage media (such as ananalog or digital magnetic tape or a hard disk drive), and opticalstorage media (such as a CD, a DVD, or a Blu-ray Disc).

The apparatuses and methods described in this app may be partially orfully implemented by a special purpose computer created by configuring ageneral purpose computer to execute one or more particular functionsembodied in computer programs. The functional blocks and flowchartelements described above serve as software specifications, which can betranslated into the computer programs by the routine work of a skilledtechnician or programmer.

The computer programs include processor-executable instructions that arestored on at least one non-transitory computer-readable medium. Thecomputer programs may also include or rely on stored data. The computerprograms may encompass a basic input/output system (BIOS) that interactswith hardware of the special purpose computer, device drivers thatinteract with particular devices of the special purpose computer, one ormore operating systems, user apps, background services, background apps,etc.

The computer programs may include: (i) descriptive text to be parsed,such as HTML (hypertext markup language) or XML (extensible markuplanguage), (ii) assembly code, (iii) object code generated from sourcecode by a compiler, (iv) source code for execution by an interpreter,(v) source code for compilation and execution by a just-in-timecompiler, etc. As examples only, source code may be written using syntaxfrom languages including C, C++, C#, Objective-C, Haskell, Go, SQL, R,Lisp, Java®, Fortran, Perl, Pascal, Curl, OCaml, Javascript®, HTML5,Ada, ASP (active server pages), PHP, Scala, Eiffel, Smalltalk, Erlang,Ruby, Flash®, Visual Basic®, Lua, and Python®.

None of the elements recited in the claims are intended to be ameans-plus-function element within the meaning of 35 U.S.C. §112(f)unless an element is expressly recited using the phrase “means for” or,in the case of a method claim, using the phrases “operation for” or“step for.”

What is claimed is:
 1. A search system comprising: a query receiverconfigured to receive a first query related to a first application stateof a first application, wherein the first query is based on auser-requested search from a user device; a live scrape systemconfigured to: operate a device including an operating system in whichan instance of the first application executes; and responsive to thefirst query, and within the device, (i) navigate to the firstapplication state in the instance of the first application and (ii)extract content from the instance of the first application; and acontent provider configured to provide the extracted content to the userdevice for presentation to a user of the user device in real time. 2.The search system of claim 1 wherein the device comprises one of: anemulator configured to execute the operating system on emulatedhardware; and a simulator configured to simulate interaction of theoperating system with the first application.
 3. The search system ofclaim 1 wherein the live scrape system is configured to, responsive tothe first query, and within the device: navigate to the firstapplication state in the instance of the first application; supplysearch parameters based on the first query to the first applicationstate; initiate a search action from the first application state; andextract content from search results provided by the instance of thefirst application in response to the search action.
 4. The search systemof claim 1 further comprising a query understanding pipeline configuredto parse the user-requested search and identify a set of entity typesprovided in the user-requested search, wherein the first query isgenerated in response to the identified set of entity types from theuser-requested search.
 5. The search system of claim 4 wherein: theuser-requested search includes a text string; and the queryunderstanding pipeline is configured to identify one or more substringsof the text string, each substring corresponding to an entity having acorresponding entity type.
 6. The search system of claim 4 furthercomprising a rules-based search module configured to: store a set ofrules, each rule designating (i) an application, (ii) an applicationstate of the designated application through which a search can beperformed, and (iii) for each input parameter of the search, a type ofentity that can be provided as the input parameter; select a first rulefrom the set of rules based on a match between the designated entitytypes for the first rule and the identified set of entity types from theuser-requested search; and generate the first query based on the firstrule.
 7. The search system of claim 1 wherein the extracted content isprovided to the user device in a form of at least one deep view card(DVC), wherein each DVC of the at least one DVC includes content to bedisplayed to the user including an identification of the firstapplication, an identification of the first application state, andinformation extracted from the first application state.
 8. The searchsystem of claim 7 further comprising: a DVC builder configured to formatthe extracted content into the at least one DVC; and a DVC templatestore configured to store a first template that specifies which portionsof the extracted content correspond to fields in the at least one DVC.9. The search system of claim 1 further comprising a query routerconfigured to, responsive to the query receiver receiving a secondquery: determine whether results data corresponding to the second queryhave been cached; in response to the results data having been cached,selectively provide the results data to the content provider; and inresponse to the results data not having been cached, selectively providethe second query to the live scrape system.
 10. The search system ofclaim 9 wherein the query router is configured to, responsive to thequery receiver receiving the second query, selectively provide thesecond query to the live scrape system in response to the cached resultsdata being out of date.
 11. The search system of claim 1 furthercomprising a query router configured to, responsive to the queryreceiver receiving a second query: in response to an applicationprogramming interface (API) being available for an application statespecified by the second query, selectively provide the second query toan API passthrough to be serviced by the available API; and in responseto no API being available for the application state specified by thesecond query, selectively provide the second query to the live scrapesystem.
 12. The search system of claim 1 further comprising a queryrouter configured to, responsive to the query receiver receiving asecond query: in response to a web edition of an application statespecified by the second query being available via HTTP (hypertexttransfer protocol), selectively provide the second query to a web scrapesystem to be serviced by the available web edition; and in response tono web edition being available for the application state specified bythe second query, selectively provide the second query to the livescrape system.
 13. A computer-implemented method comprising: receiving afirst query related to a first application state of a first application,wherein the first query is based on a user-requested search from a userdevice; operating a device including an operating system in which aninstance of the first application executes; responsive to the firstquery, and within the device, (i) navigating to the first applicationstate in the instance of the first application and (ii) extractingcontent from the instance of the first application; and providing theextracted content to the user device for presentation to a user of theuser device in real time.
 14. The method of claim 13 wherein operatingthe device comprises one of: operating an emulator configured to executethe operating system on emulated hardware; and operating a simulatorconfigured to simulate interaction of the operating system with thefirst application.
 15. The method of claim 13 further comprising,responsive to the first query, and within the device: navigating to thefirst application state in the instance of the first application;supplying search parameters based on the first query to the firstapplication state; initiating a search action from the first applicationstate; and extracting content from search results provided by theinstance of the first application in response to the search action. 16.The method of claim 13 further comprising: parsing the user-requestedsearch; and identifying a set of entity types provided in theuser-requested search, wherein the first query is generated in responseto the identified set of entity types from the user-requested search.17. The method of claim 16 wherein: the user-requested search includes atext string; and the method further comprises identifying one or moresubstrings of the text string, wherein each substring corresponds to anentity having a corresponding entity type.
 18. The method of claim 16further comprising: storing a set of rules, each rule designating (i) anapplication, (ii) an application state of the designated applicationthrough which a search can be performed, and (iii) for each inputparameter of the search, a type of entity that can be provided as theinput parameter; selecting a first rule from the set of rules based on amatch between the designated entity types for the first rule and theidentified set of entity types from the user-requested search; andgenerating the first query based on the first rule.
 19. The method ofclaim 13 wherein: the extracted content is provided to the user devicein a form of at least one deep view card (DVC); and each DVC of the atleast one DVC includes content to be displayed to the user including anidentification of the first application, an identification of the firstapplication state, and information extracted from the first applicationstate.
 20. The method of claim 19 further comprising: formatting theextracted content into the at least one DVC; and storing DVC templatesincluding a first template, wherein the first template specifies whichportions of the extracted content correspond to fields in the at leastone DVC.
 21. The method of claim 13 further comprising, in response toreceiving a second query: determining whether results data correspondingto the second query have been cached; in response to the results datahaving been cached, selectively providing the cached results data to theuser device; and in response to the results data not having been cached,selectively navigating to an application state based on the secondquery, extracting content, and providing the extracted content to theuser device.
 22. The method of claim 21 wherein the selectivelynavigating to an application state based on the second query, extractingcontent, and providing the extracted content to the user device areperformed in response to the cached results data being out of date. 23.The method of claim 13 further comprising, in response to receiving asecond query corresponding to a second application: in response to anapplication programming interface (API) being available for anapplication state specified by the second query, selectively (i)preparing an API call, (ii) transmitting the API call to a backendserver for the second application, and (iii) providing content to theuser device based on a response to the API call from the backend server;and in response to no API being available for the application statespecified by the second query, selectively navigating to an applicationstate based on the second query, extracting content, and providing theextracted content to the user device.
 24. The method of claim 13 furthercomprising, in response to receiving a second query corresponding to asecond application: in response to a web edition of the secondapplication being available, selectively (i) sending an HTTP (hypertexttransfer protocol) request to a backend server for the secondapplication, (ii) receiving an HTTP response from the backend server,(iii) scraping data from the HTTP response, and (iv) providing contentto the user device based on the scraped data; and in response to no webedition being available for the second application, selectivelynavigating to an application state based on the second query, extractingcontent, and providing the extracted content to the user device.